1. Prepare the miniproject

1.1 Grab all .xml, .sh, .sql and .hql files from TrainingOnHDP/DataFlowSchedulingOnOozie/end2end at github into /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end at your HDP sandbox

1.2 Run the following commands via SSH on your VM:

	hadoop fs -mkdir /user/root/oozie
	hadoop fs -mkdir /user/root/oozie/workflow
	hadoop fs -mkdir /user/root/oozie/coordinator
	hadoop fs -mkdir /user/root/oozie/bundle
	hadoop fs -mkdir /user/root/oozie/input
	hadoop fs -mkdir /user/root/oozie/output
	hadoop fs -mkdir /user/root/oozie/lib
	hadoop fs -mkdir /user/root/oozie/scripts
	hadoop fs -mkdir /user/root/oozie/tmp

	hadoop fs -chmod 777 /user/root/oozie
	hadoop fs -chmod 777 /user/root/oozie/workflow
	hadoop fs -chmod 777 /user/root/oozie/coordinator
	hadoop fs -chmod 777 /user/root/oozie/bundle
	hadoop fs -chmod 777 /user/root/oozie/input
	hadoop fs -chmod 777 /user/root/oozie/output
	hadoop fs -chmod 777 /user/root/oozie/lib
	hadoop fs -chmod 777 /user/root/oozie/scripts
	hadoop fs -chmod 777 /user/root/oozie/tmp

	hadoop fs -mkdir /user/yarn
	hadoop fs -mkdir /user/yarn/business
	
	hadoop fs -chmod 777 /user/yarn
	hadoop fs -chmod 777 /user/yarn/business

	ex -bsc '%!awk "{sub(/\r/,\"\")}1"' -cx /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/wgetfiles.sh
	
	The password to access MySQL as root in the /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/preparemysql.sh might need to change
	
	--password=hadoop
	
	
	ex -bsc '%!awk "{sub(/\r/,\"\")}1"' -cx /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/preparemysql.sh
	
	hadoop fs -rm -skipTrash /user/root/oozie/workflow/oozie_in_action.xml
	hadoop fs -put /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/oozie_in_action.xml /user/root/oozie/workflow

	hadoop fs -rm -skipTrash /user/root/oozie/coordinator/oozie_in_action_coord.xml
	hadoop fs -put /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/oozie_in_action_coord.xml /user/root/oozie/coordinator

	hadoop fs -rm -skipTrash /user/root/oozie/scripts/wgetfiles.sh
	hadoop fs -put /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/wgetfiles.sh /user/root/oozie/scripts
	hadoop fs -chmod 777 /user/root/oozie/scripts/wgetfiles.sh

	hadoop fs -rm -skipTrash /user/root/oozie/scripts/preparemysql.sh
	hadoop fs -put /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/preparemysql.sh /user/root/oozie/scripts
	hadoop fs -chmod 777 /user/root/oozie/scripts/preparemysql.sh

	hadoop fs -rm -skipTrash /user/root/oozie/scripts/preparemysql.sql
	hadoop fs -put /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/preparemysql.sql /user/root/oozie/scripts
	hadoop fs -chmod 777 /user/root/oozie/scripts/preparemysql.sql

	hadoop fs -rm -skipTrash /user/root/oozie/scripts/loadintohive.hql
	hadoop fs -put /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/loadintohive.hql /user/root/oozie/scripts
	hadoop fs -chmod 777 /user/root/oozie/scripts/loadintohive.hql

	remove /tmp/business.csv if exists on your VM

1.3 login MySQL (mysql -u root –p), and run the following commands:

	1.3.1 CREATE USER 'sqoop_user'@'localhost' IDENTIFIED BY 'password1';
		  GRANT ALL PRIVILEGES ON *.* TO 'sqoop_user'@'localhost' WITH GRANT OPTION;
	      GRANT FILE ON *.* TO 'sqoop_user'@'localhost' WITH GRANT OPTION;
		  FLUSH PRIVILEGES;

	1.3.2 CREATE DATABASE oozie;
		  USE oozie;	
	
1.4 On your VM, run the following commands via SSH:

	hadoop credential create mypassword_alias -provider jceks://hdfs/user/root/password.jceks
	
	then type the password for sqoop_user (e.g. password1)

1.5 On your VM, run the following commands via SSH:
	sudo -su yarn
	sqoop job --delete business_import
	sqoop job -Dhadoop.security.credential.provider.path=jceks://hdfs/user/root/password.jceks --create business_import -- import --connect jdbc:mysql://localhost/oozie --username sqoop_user --table business --fetch-size 10 --as-parquetfile --target-dir /user/yarn/tmp --incremental append -m 2 --last-value 0 --check-column id --password-alias mypassword_alias
	exit
	
1.6 The following step can be skipped if you are using HDP 2.6.3

	Login Ambari Console as admin, goto Oozie and click config, add the following to log4j:

	log4j.appender.oozieError=org.apache.log4j.rolling.RollingFileAppender
	log4j.appender.oozieError.RollingPolicy=org.apache.oozie.util.OozieRollingPolicy
	log4j.appender.oozieError.File=${oozie.log.dir}/oozie-error.log
	log4j.appender.oozieError.Append=true
	log4j.appender.oozieError.layout=org.apache.log4j.PatternLayout
	log4j.appender.oozieError.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - SERVER[${oozie.instance.id}] %m%n
	log4j.appender.oozieError.RollingPolicy.FileNamePattern=${log4j.appender.oozieError.File}-%d{yyyy-MM-dd-HH}
	log4j.appender.oozieError.RollingPolicy.MaxHistory=720
	log4j.appender.oozieError.filter.1 = org.apache.log4j.varia.LevelMatchFilter
	log4j.appender.oozieError.filter.1.levelToMatch = WARN
	log4j.appender.oozieError.filter.2 = org.apache.log4j.varia.LevelMatchFilter
	log4j.appender.oozieError.filter.2.levelToMatch = ERROR
	log4j.appender.oozieError.filter.3 =`enter code here` org.apache.log4j.varia.LevelMatchFilter
	log4j.appender.oozieError.filter.3.levelToMatch = FATAL
	log4j.appender.oozieError.filter.4 = org.apache.log4j.varia.DenyAllFilter 

	Change the follwoing from INFO, WARNING to 
	
	log4j.logger.org.apache.oozie=ALL, oozie, oozieError

1.7 Login as root and run the following commands on Your VM:

	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/mysql-connector-java.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/kite-data-core-1.1.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/kite-data-mapreduce-1.1.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/kite-hadoop-compatibility-1.1.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/kite-data-hive-1.1.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop

	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/parquet-avro-1.6.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/parquet-column-1.6.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/parquet-common-1.6.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/parquet-encoding-1.6.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/parquet-format-2.2.0-rc1.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/parquet-generator-1.6.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/parquet-hadoop-1.6.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	sudo -su oozie hadoop fs -put /usr/hdp/2.6.3.0-235/sqoop/lib/parquet-jackson-1.6.0.jar /user/oozie/share/lib/lib_20171110144231/sqoop
	

1.8 Restart the oozie service

1.9 Hive Configuration Change – Run Out Of Memory Issue

	hive.tez.java.opts=-Xmx1024m

	Restart Hive

	
2. Launch the Project

	Change the startTime and endTime in coordinator.properties if needed
	
	oozie job -oozie http://127.0.0.1:11000/oozie -config /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/coordinator.properties -run

	Note: 
	
	If you are experencing the following issue: 	Error: E0803: IO Error, E0603: SQL in operation

	then Goto /hadoop/oozie/data/oozie-db

	ls -l /hadoop/oozie/data/oozie-db
	rm /hadoop/oozie/data/oozie-db/dbex.lck 
	rm /hadoop/oozie/data/oozie-db/db.lck
	chown -R oozie:hadoop /hadoop/oozie/data
	chmod 777 /hadoop/oozie/data

	Restart oozie service
	
	
3. Create the exteranl hive table

	CREATE EXTERNAL TABLE IF NOT EXISTS business(
		location_id varchar(100),
		business_account_number INT,
		qwnership_name varchar(100),
		dba_name varchar(100),
		street_address varchar(100),
		city varchar(100),
		state varchar(100),
		source_zipcode varchar(100),
		business_start_date varchar(100),
		business_end_date varchar(100), 
		location_start_date varchar(100),
		location_end_date varchar(100),
		mail_address varchar(100),
		mail_city varchar(100),
		mail_zipcode varchar(100), 
		mail_state varchar(100),
		naics_code varchar(100),
		naics_code_description varchar(100),
		parking_tax varchar(100),
		transient_occupancy_tax varchar(100), 
		lic_code varchar(100),
		lic_code_description varchar(100),
		supervisor_district varchar(100),
		neighborhoods_analysis_boundaries varchar(100),
		business_corridor varchar(100),
		business_location varchar(100),
		id int
	)
	PARTITIONED BY(jobid string) 
	STORED AS PARQUET 
	LOCATION "/user/yarn/business";

	MSCK REPAIR TABLE business;


4. The following query will be used in Zeppelin: 

	Import Oozie In Action.json from /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end into Zepplin (http://localhost:9995)

	select city, count(*) num_per_city from business group by city;
	select state, count(*) num_per_tate from business group by state;


5. If you are experiencing the memory issue with Sqoop

	Go to /etc/my.cnf
	
	Add the following line:
	max_allowed_packet=1048576

	Restart MySQL
	/etc/init.d/mysqld start

6. The following is used for Password-less ssh to set from oozie@<oozie-host> to root@<remote-host> without password

6.1  Password-less ssh must be set from oozie@<oozie-host> to user@<remote-host>

	To Configure passwordless ssh:

	Sudo to Oozie server host via oozie user

	sudo -su oozie

6.2 Run below command ( press enter for every question to keep all the options default )

	ssh-keygen

6.3 On Oozie host copy contents of ~/.ssh/id_rsa.pub and paste it in <remote-host>’s ~/.ssh/authorized_keys file

6.4 Please make sure to have below permissions in order to get password-less ssh working successfully

	700 to ~/.ssh directory on oozie as well as remote-host
	600  to ~/.ssh/authorized_keys file on remote-host
	600 to ~/.ssh/id_rsa on oozie host
	644 on ~/.ssh/id_rsa.pub on oozie host

6.5 Add the following to /etc/ssh/sshd_config 

	PermitRootLogin without-password  
	RSAAuthentication yes
	PubkeyAuthentication yes
	PermitRootLogin yes

6.6 /etc/rc.d/init.d start

6.7 Test ssh from oozie@<oozie-host> to <username>@<remote-host>, ssh command should work without password.


7. Some commands for oozie job

7.1. Submit and Run Oozie workflow using below command

	oozie job -oozie http://127.0.0.1:11000/oozie -config /root/TrainingOnHDP/DataFlowSchedulingOnOozie/end2end/job.properties -run

	Note:

	If you are experencing the following issue:

	Error: E0803: IO Error, E0603: SQL in operation

	then

	Goto /hadoop/oozie/data/oozie-db

	Remove dbex.lck and db.lck files

	chown -R oozie:hadoop /hadoop/oozie/data

	chmod 777 /hadoop/oozie/data


	Restart oozie service

7.2. Hit below URL to open Oozie WebUI

	http://127.0.0.1:11000/oozie/?user.name=admin


7.3 Checking the Status of a Workflow, Coordinator or Bundle Job or a Coordinator Action

	oozie job -oozie http://127.0.0.1:11000/oozie -info 15-20090525161321-oozie

7.4 Killing a Workflow, Coordinator or Bundle Job

	oozie job -oozie http://127.0.0.1:11000/oozie -kill 15-20090525161321-oozie

7.5 Suspending a Workflow, Coordinator or Bundle Job

	oozie job -oozie http://127.0.0.1:11000/oozie -suspend 15-20090525161321-oozie

7.6 Starting a Workflow, Coordinator or Bundle Job

	oozie job -oozie http://127.0.0.1:11000/oozie -start 15-20090525161321-oozie

7.7 Submitting a Workflow, Coordinator or Bundle Job

	oozie job -oozie http://127.0.0.1:11000/oozie -config /training/apps/oozie/job.properties -submit

7.8 Resuming a Workflow, Coordinator or Bundle Job

	oozie job -oozie http://127.0.0.1:11000/oozie -resume 15-20090525161321-oozie

